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Abstract 

A simple framework for reasoning under uncertainty and intervention 
is introduced. This is achieved in three steps. First, logic is restated in 
set-theoretic terms to obtain a framework for reasoning under certainty. 
Second, this framework is extended to model reasoning under uncertainty. 
Finally, causal spaces are introduced and shown how they provide enough 
information to model knowledge containing causal information about the 
world. 



1 Bayesian Probability Theory 

It is advantageous to endow plausibilities with an explanatory framework that 
has a logically intuitive appeal. Such a framework is Bayesian probability theory. 
Simply put, Bayesian probability theory is a framework that extends logic for 
reasoning under uncertainty. 

1.1 Reasoning under Certainty 

Logic is the most important framework of reasoning (under certainty). Here, it 
is rephrased in set-theoretic termfl As will be seen, this facilitates its extension 
to a framework for reasoning under uncertainty. 

Let O be a set of outcomes, which is assumed to be finite for simplicity. A 
subset A C is an event. Let c , U and fl be the set-operations of complement, 
union and intersection respectively. Let J- be an algebra, i.e. a set of events 
obeying the axioms 

Al. J" / 0. 

A2. A e T => A c eJ. 

A3. A,B e T => A\JB eJ. 



1 Strictly speaking, this set-theoretic logic is "a logic within logic", since set theory is based 
on standard logic. 
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In this framework, an outcome lo G f2 is a state of affairs and an event A G T 
is a proposition. Hence, a singleton {lo} G T is an irreducible (i.e. atomic) 
proposition about the world. The set-operations c , U and n correspond to 
the logical connectives of -i (negation), V (disjunction) and A (conjunction) 
respectively. They allow the construction of complex propositions from simpler 
ones. An algebra is a system of propositions that is closed under negation and 
disjunction (and hence is closed under conjunction as well), i.e. it comprises all 
propositions that the reasoner might entertain. 

Remark 1. A trivial consequence of the axioms is that both the universal event 
and the impossible event are in J 7 . 

The objective of logic is to allow the reasoner to conclude the veracity of 
events given information. Let V := {1,0,?} be the set of truth states, where 
1 is true, is false, and ? is uncertain (but known to be either true or false). 
From these, {1,0} are called truth values. The truth function is the set 
function T over T x T defined as 



Furthermore, define the shorthand T(A) := T(A\il). The quantity T(A\B) 
stands for the "truth value of event A given that event B is true" . Accordingly, 
the knowledge of the reasoner about the facts of the world is represented by his 
truth function and his algebra. From his point of view, a proposition can be 
either true, false or uncertain (i.e. having an unresolved truth value given his 
knowledge). Understanding the definition of the truth function is straightfor- 
ward. Claiming that an event B G J- is true means that one of its members 
lo G B is the current outcome/state of affairs. Hence the veracity of A given 
B is evaluated as follows (Figure [lj: if A contains every outcome in B then it 
must be true as well; if A is known not to contain any of B's outcome then it 
must be false; and if A contains only part of B then it cannot be resolved, since 
knowing that wgJJ does neither imply that lo G A nor lo G A c . The definition 
of a truth space follows. 

Definition 1 (Truth Space). A truth space is a tuple (f2, J 7 , T) where: f2 is a 
set of outcomes, J- is an algebra over and T: Jx J- > V is a truth function. 

The intuitive meaning of a truth space is as follows. Nature arbitrarily 
selects an outcome uEfl, (This choice is not governed by a generative law.) 
Subsequently, the reasoner performs a measurement: he chooses a set B and 
nature reveals to him whether lo G B or not. Accordingly, the reasoners infers 
the veracity of any event A G T by evaluating either T(A\B) (if lo G B) or 



Several measurements are combined as a conjunction. Thus, if the reasoner 
learns that lo is in B\, B%, . . . , and B t after performing t measurements, then 
the truth value is T{A\B X n • • • n B t ) for any A G T. 




if B C A, 
if AnB = 0, 



else. 



T(A\B C ) (if lo £ B). 
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(c) T(C|A) = 1 (d) T(B|A) = ? 

Figure 1: A truth space. It is known that the true outcome u £ fl is in A. 
Hence, (a) the event A is true and (b) its complement A c is false, (c) Any event 
that contains a true event is true as well, (d) An event that contains only part 
of a true event is uncertain. 



Remark 2. Knowing that lj 6 does not resolve uncertainty, i.e. T(A|f2) = ? 
for any A € J 7 \ {f2, 0}, while knowing that lu S {w} resolves all uncertainty, 
i.e. T(A|{w}) e {0, 1} for any A^T. 

Remark 3. The set relation B C A corresponds to the logical relation B => A. 
Since an algebra is an encoding of how sets are contained within each other, it 
should be clear that an algebra is essentially a system of implications. 

1.2 Reasoning under Uncertainty 

Unlike logic, Bayesian probability theory allows reasoning under uncertainty. 
For this end, it provides a consistent mechanism to replace the uncertainty 
state ? with a numerical value in the interval [0, 1] representing degrees of truth, 
belief or plausibility. 




~B{A\B) = T(A\B) = B(A\B)= B(A\B) = T(A\B) = 1 

Figure 2: Extension of Truth Function. 

The goal is to find a suitable definition of a quantity R(A\B) meaning "the 
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degree of belief in event A given that event B is true" that is consistent with the 
truth function when it is certain, i.e. B(A\B) := T(A\B) if T(A\B) e {0,1}. 
Consider the three situations in Figure [5] (a) In the case A n B = 0, we 
impose B(A\B) := T(A\B) = 0. (b) In the case BcA,we impose B(A\B) := 
T(A|5) = 1. (c) In the intermediate case where T(A\B) = ?, the event A only 
partially covers the members of B. If one interprets the quantity B(C\D) as 
"the fraction of D contained in C", then one can characterize B(Aj-B) with the 
relation 

as long as B(B\fl) > 0. It is easy to see that this formula generalizes correctly 
to the border cases, since B(A|-B) = gp}p^ = when AC\B = and B(A\B) = 

gf§£» = 1 when B C A. Noting that B = B n f2 and rearranging terms, one 
gets 

B(A n B\ft) = B(B\Q) B(A\B n Q). 

This relation should hold under any restriction to a "universal" set C e J, not 
only when it is restricted to fl. Thus, replacing Q by C one obtains 

B(A n B\C) = B{B\C) B(A\B n C), 

which is known as the product rule for beliefs. Following a similar reasoning, 
we impose that for any event A £ J, the sum of the degree of belief in A and 
its complement A c must be true under any condition B, i.e. 

B{A\B) +B{A C \B) = 1, 

which is known as the sum rule for beliefs. In summary, we impose the following 
axioms for beliefs. 

Definition 2 (Belief axioms). Let VI be a set of outcomes and let JF be an 
algebra over fl. A set function P over T x T is a belief function iff 

Bl. A.BeT, B(A\B) G [0,1]. 

B2. A, Be J 7 , B(A\B) = 1 if Be A 

B3. A-BGJ", B(A\B) = ifAn£ = 0. 

B4. A, B e T, B{A\B) + B(A C \B) = 1. 

B5. A,B,C e J 7 , B(A n B\C) = B(A\C) B(B\A n C). 

Furthermore, define the shorthand B(A) := B(A|f2). Axiom Bl states that 
degrees of belief are real values in the unit interval [0, 1]. Axioms B2 and B3 
equate the belief and the truth function under certainty. Axioms B4 and B5 are 
the structural requirements under uncertainty discussed above. Accordingly, 
one defines a belief space as follows. 
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Definition 3 (Belief Space). A belief space is a tuple (f^T 7 , B) where: f2 is 
a set of outcomes, JF is an algebra over SI and B : T x T — > [0, 1] is a belief 
function. 

The intuitive meaning of a belief space is analogous to a truth space. Nature 
arbitrarily selects an outcome u> £ fl. Subsequently, the reasoner performs a 
measurement: he chooses a set i? and nature reveals to him whether lo G B or 
not. Accordingly, the reasoners infers the degree of belief in any event A 6 T 
by evaluating either B{A\B) (if w £ B) or B(A|B C ) (if u <£ B). 

Remark 4. The word "subsequently" , that has been emphasized for the sec- 
ond time now, is crucial. When the reasoner performs his measurements, the 
outcome is already determined. 

An easy but fundamental result is that the axioms of belief are equivalent 
to the axioms of probability. This simple observation is what constitutes the 
foundation of Bayesian probability theory. 



1.3 Bayes' Rule 

We now return to the central topic of this chapter. Suppose the reasoner has 
uncertainty over a set of competing hypotheses about the world. Subsequently, 
he makes an observation. He can use this observation to update his beliefs about 
the hypotheses. The following theorem explains how to carry out this update. 

Theorem 1 (Bayes' Rule). Let (Q,J-, B) be a belief space. Let {Hi, . . . , Hn} 
be a partition of f2, and let D £ T be an event such that B(D) > 0. Then, for 
aline {1,...,N}, 

B(H n \D) = - B ( £) I F «) B ™ - V(D\Hn) B(H n ) 



BP) J2 m B(D\H m )B(H m )- 

The interpretation is as follows. The H±, . . . , Hn represent N mutually ex- 
clusive hypotheses, and the event D represents an new observation or data. 
Initially, the reasoner holds a prior belief B(H n ) over each hypothesis H n . 
Subsequently, he incorporates the observation of the event D and arrives at a 
posterior belief B(H n \D) over each hypothesis H n . Bayes' rule states that 
this update can be seen as combining the prior belief B(H n ) with the like- 
lihood B(D\H n ) of observation D under hypothesis H n . The denominator 
J2 m B(D\H m )B(H m ) — B(D) just plays the role of a normalizing constant 
(Figure [3]). 

Bayes' rule naturally applies to a sequential setting. Incorporating a new 
observation D t after having observed D\,Di, . . . , D t -i updates the beliefs as 

B^n.-nA)" B t (D t \H n )B t (H n ) 



J2 m B t (D t \H m )B(H v 



2 More precisely, the axioms of beliefs as stated here imply the axioms of probability for 
finitely additive measures over finite algebras. Furthermore, the axioms of beliefs also specify 
a unique version of the conditional probability measure. 
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Figure 3: Schematic Representation of Bayes' Rule. The prior belief in hypothe- 
ses Hi, Hi and H% is roughly uniform. After conditioning on the observation D, 
the belief in hypothesis H3 increases significantly. 

where for the i-th update, 

B t (H„) := B(H n \Dia ■ -nA-i) and B t (D t \H n ) := B(D t \H n nDin- ■ •nA-i) 

play the role of the prior belief and the likelihood respectively. Note that 

t 

B(Di n • • ■ n D t \H n ) = Y[ B(D T \H n n^n-n d t _i), 

T = l 

and hence each hypothesis H n naturally determines a probability measure B(-\H n ) 
over sequences of observations. 




Figure 4: Progressive refinement of the accuracy of the joint observation. The 
sequence of observations D\, . . . , D5 leads to refinements Si, S2, ■ ■ ■ , S$, where 
S t = Di n • •■ n D t . Note that S5 C Hi and therefore B(i2i|S 5 ) = 1, while 
B(H 2 \S 5 )=B(H 3 \S 5 ) = 0. 



A smaller event D corresponds to a more "accurate" observation. Hence, 
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making a new observation D' necessarily improves the accuracy, since 

D D DC\D'. 

In some cases, the accuracy of an observation (or sequence of observations) can 
be so high that it uniquely identifies a hypothesis (Figure H|) . 

The way Bayes' rule operates can be illustrated as follows. Consider a par- 
tition {Xi, . . . , Xk} of SI and let 7i* £ {Hi, . . . , Hn} be the true hypothesis, 
i.e. the outcome ui £ f2 is drawn obeying propensities described by B(-|if»). 
The Xk represent different observations the reasoner can make. If cj is drawn 
and reported to be in Xk , then the log-posterior probability of hypothesis H n 
is given by 




In Pn C 



This decomposition highlights all the relevant terms for understanding Bayesian 
learning. The term /„ is the log-likelihood of the data Xk- The termp„ is the log- 
prior of hypothesis H n , which is a way of representing the relative confidence in 
hypothesis H n prior to seeing the data. In practice, it can also be interpreted as 
(a) a complexity term, (b) the log-posterior resulting from "previous" inference 
steps, or (c) an initialization term for the inference procedure. The term c is the 
log-probability of the data, which is constant over the hypotheses, and thus does 
not affect our analysis. Hence, log-posteriors are compared by their differences 
in l n +p n . Ideally, the log-posterior should be maximum for the true hypothesis 
H n = H*. However, since u> is chosen randomly, the log-posterior logB(iI„|JCfc) 
is a random quantity. If its variance is high enough, then a particular realization 
of the data can lead to a log-posterior favoring some "wrong" hypotheses over 
the true hypothesis, i.e. l n + p n > U +p* for some H n ^ H*. In general, this 
is an unavoidable problem (that necessarily haunts every statistical inference 
method). Further insight can be gained by analyzing the expected log-posterior: 

^B(X k \H*)\ogB(X k \H n ) + logB(iJ„) - ^B(X fe |if*) logB(X fc ) . 

V v ' S v ' V v ' 

Ln Pn=P„ C 

This revealfd that, on average, the log-likelihood L n is indeed maximized by 
H n = H*. Hence, the posterior belief will, on average, concentrate its mass on 
the hypotheses having high L n + P n . 



1.4 Conditioning on Events with Zero Belief 

There is one technical point that merits closer inspection. Consider two events A,B£ 
T such that B n A ^ but B(B) = 0. One has that 

T(^1|B) = J 1 lfBcA and B{ADB) = B(B)B(A\B) =0 
I ? else 

3 For pi , qi probabilities, pi log qi is maximum when qi = pi for fixed pi . 
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due to the definition of the truth function and due to Axiom B4. From this, 
we conclude that B(A PI B) = 0. For B(A|£?) there are two possible cases. If 
B C A, then B(A\B) = 1 due to Axiom B2. However, if B £ A, then B(A\B) 
is independent of the degree of belief B(C) of any event C G J- . More generally, 
if D G J- is such that ~B(B\D) = 0, then the value of ~R{A\B) is independent of 
the degree of belief B(C|£>) of any event C G T . 

The bottom line is that conditioning on an event with zero belief is a well- 
defined operation under the belief axioms outlined in Definition [2J This is not 
so in the case of the probability axioms of measure theory. In measure theory, 
the probability measure is a global measure \x over J 7 , i.e. a function assigning 
probability mass fJ,(A) to any event A G T. However, implicit in this definition 
is the fact that these masses are measured w.r.t. the certain event Vt. Because 
of this, the information contained in the probability measure fi is insufficient 
to uniquely determine the conditional probability measure l^(-\B) arising from 
conditioning on an event B G T having n(B) = 0. In contrast, the belief 
function B is a well-defined measure w.r.t. any conditioning event B G J-, i.e. 
assigning probability mass B(A|i3) to any event A G T . 

2 Causality 

Suppose there is an unknown cause influencing a result we are waiting for. As 
soon as we observe the result, we learn something about the unknown cause. 
However, if instead we decide to interrupt the natural regime of the process by 
choosing the result ourselves, then our knowledge about the unknown cause will 
not change. This is simply because we know that our current actions cannot 
change the past anymore. Meanwhile, in both cases, we learn something about 
the future, i.e. about all the outcomes that will follow the result. 

This distinction between belief updates following externally generated obser- 
vations and internally generated actions is not modeled in Bayesian probability 
theory. Essentially, the theory lacks the formal tools to deal with indeterminate 
outcomes chosen by the reasoner himself. This requires introducing additional 
information to clearly identify the past and the future of choices, or more ab- 
stractly speaking, introducing a causal order of events. 

3 Causal Spaces 

The aim of this section is to introduce causal spaces. Causal spaces contain 
enough information to characterize the causal structure of a random process. 

Let 57 be a finite set of outcomes. An atom set A is a partition of f2, and 
an atom is a member A G A. Given a set £ of subsets of f2, define the algebra 
generated by £ , written a(£), as the smallest algebra over f2 containing every 
member of £ . Furthermore, define the atom set generated by an algebra T, 
written a(J-), as the largest set of atoms containing members of T . For any set 
£ of subsets of il, we also abbreviate a(£ ) :— a(a(£)). 
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Remark 5. In the finite case, it is easily seen that both generated algebras and 
generated atom sets are unique. 

Definition 4 (Primitive Events). Let E = (E , E\, E 2 , ■ ■ ■ , Em) be a finite 
sequence of subsets of called primitive events, where Eq := ft, and where 
for all n > 1, 



Furthermore, define £ n := {E n ,En} and Aq, A\, . . . , An as the sequence of 
atom sets 



This setup is illustrated in Figure [5] The sequence of primitive events is 
an abstract characterization of a random process that occurs in discrete steps 
n = 1, 2, . . . , N. Each step n is associated with a primitive event E n representing 
a basic proposition whose truth value is resolved during this step (and not 
before!), i.e. step n determines whether the outcome uj 6 Q is either in E n or 
in E c n . The n-th atom set An contains one proposition for each possible path 
the random process can take. Therefore, after n steps, the process will find itself 
in one (and only one) of the members in A n - 



Remark 6. The condition that E n cannot be in the algebra generated by the 
previous events Eq, . . . , -E n _i guarantees that E n adds a new proposition that 
cannot be expressed in terms of the previous propositions. 

The sequence of primitive events E — (E\,...,En) can equivalently be 
represented by any sequence E' = (E[, . . . ,E' N ) where E' n € £„. Due to this, 
we will call any member of £ n primitive event. We introduce causal functions. 






Figure 5: Primitive Events and their Atom Sets. 
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Definition 5 (Causal Axioms). Let ft be a set of outcomes, and let E — 
(E±, . . . , En) be a sequence of primitive events. A set function C„ is a n-th 
causal function iff 

CI. A 6 £ n , B G An-i, C n (A\B) G [0,1]. 

C2. i4 G £„, B G Ai-i, C n (A|B) = 1 if B c A. 

C3. A G £ n ,-B G Ai-i, C„(A|B)=0 if An 5 = 0. 

C4. A G £ n ,B G A>-i, C n (A|S) + C„(A C |B) = 1. 
Hence, C„ maps £„ x „4„-i into [0, 1]. A causal function over E is a function 

C(A\B) = C n (A\B), if A e £ n , B e A n -i, 
where C„ is an n-th causal function. Hence, C maps (J n (£ n X A n -i) into [0, 1]. 

The intuition behind this definition is as follows. The causal function speci- 
fies the knowledge the reasoner has about the evolution of a random process. It 
specifies the likelihood of a primitive event A G £ n to happen after the random 
process is known to have taken a path B e A n -i- 

By comparing Axioms C1-C4 with Axioms B1-B5 (Section II. 2p of belief 
functions, we observe the following. First, in contrast to B, only a subset of 
combinations (A, B) G T x T is specified for C, namely, the ones that chain 
a history of primitive events B G An—i C J- together with the primitive event 
A G £ n C T that immediately follows. Second, Axioms C1-C4 play the same 
role as Axioms B1-B4, namely: (CI) probabilities lie in the unit interval [0, 1]; 
(C2 & C3) probabilities are consistent with the truth function; and (C4) prob- 
abilities of complementary events add up to one. No axiom analogous to Ax- 
iom B5 is needed for C. 

Putting everything together, one gets a causal space. A causal space contains 
enough information to derive an associated belief space. 

Definition 6 (Causal Space). A causal space is a tuple (f2, E, C), where: fi is 
a set of outcomes, E is sequence of primitive events, and C is a causal function 
over E. 

Definition 7 (Induced Belief Space). Given a causal space (ft, E, C), the in- 
duced belief space is the belief space (f2, T, B) where the algebra T and the 
belief function B arc defined as 



i. F = a[{E Q ,Ei,...,E N } j 

ii. B(A\B) = C(A\B), for all (A, B) G (j n (£ n x An-i). 

Thus, the induced belief space is constructed by generating the algebra T 
from the primitive events E, and by equating the belief function B to the causal 
function C over the subset of T x T where C is defined. The following theorem 
tells us that this subset is enough to completely determine the whole belief 
function. 
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Theorem 2. The induced belief space exists and is unique. 



Proof. Let Jo, T\, . ■ . , Fn denote the sequence of algebras generated as 

Fn '■= &({Eo, Ei, . . . , E n }). 

Let r, s G N, r < s, be the smallest numbers such that B is J>-measurable and 
A is J^-measurable. Let B C A r and A C A s be the partitions of B and A 
respectively. Then, B(A|£?) = if A n B by the belief axioms, and 

B(A|B) = ^B(a|B) 

otherwise, because the members a of A are disjoint. For every a G .4, let 6 G 2? 
be the unique member of the partition of B such that a C b. Obviously, 

B(a\B) = B(a\b), 

because a n B = a n 6. Let a 1 , a 2 , . . . , a s the unique sequence o J e fj such that 

s s 

a = a 1 Da 2 n • • • D a s = a j = bn f] a 3 , 

j=l j=r+l 



where the last equality comes from b = a 1 n • • • n a r . Hence, 

j'-i 



B 



(o|&)=B( p| a J fe) = ][ B(a J 6n p| a 4 ) = JJ c(a J 6n Q a 2 ). 



j'=r+l 



j'=r+l 



-r+1 



j=r+l 



-r+l 



The last replacement can be done because a J € £j and 6 n f]j~^ +1 a 1 G Aj-i- 
Thus, we have proven the following. First, T is unique because generated alge- 
bras are unique. Second, we have shown, for arbitrarily chosen events A, B G J- , 
how to reexpress B(A|£?) into an expression involving only terms of the form 
C(C\D). Hence, it cannot be that B,B' are both consistent with C and there 
is A, B G T such that B(A\B) ^ B'(A\B). □ 

We now define the operation that specifies how the knowledge about the 
random process transforms when the reasoner himself intervenes it. 

Definition 8 (Intervention). Given a causal space (Sl,E,C) and a primitive 
event A G £ n for some n G {1, . . . , N}, the A-intervention is the causal space 
(n,E,C) where for all (B,C) G \J n (£ n x A*-i), 

1 HA = B and (BnC) <£ {0,C}, 

C'{B\C) = <0 if A = B C and (BnC) <£ {0,C}, 

C(B\C) else. 
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This is an important definition. The reasoner ask himself the question: 
"How do my beliefs about the world change if I were to choose the truth value 
of a primitive event?" This is answered by directly changing the causal func- 
tion accordingly (Figure [5]) • However, this change cannot contradict the logical 
constraints given by the underlying truth function. 

Remark 7. Note that (BnC) £ {0, C} <^> T(B\C) = ?. Hence, an intervention 
can only affect primitive propositions B £ £ n that have an unresolved truth value 
given the history C £ A n ~i. Moreover, the intervention resolves the truth value 
of B. This makes intuitively sense. 

(a) 

Eo | 1 |— 

'n I I 1 I 

E, | 1 | 1 | h^" 

e 3 H H H I — I I \—\ 

Figure 6: An Intervention. The primitive events E = (Eq, E\, E2, E3) are sets 
on the unit interval. Panels (a) and (b) show a the causal space before and after 
an ^-intervention respectively. This representation shows the atom sets Aq to 
^3 and conditional probabilities (given by the relative lengths). 

We will use the abbreviation A to denote yl-interventions on a causal space. 
When the underlying causal space (Q, E, C) inducing a belief space (fi, J 7 , B) is 
clear from the context, then the expression B(i?|^4) denotes the belief H'(B\A) 
measured w.r.t. the belief space (f2, B') induced by the A-intervention of 
(Q, E, C). Furthermore, when A S T is an event such that 

/ 

a = n Ai, 

i=l 

where each A4 is a primitive event, then the ^-intervention is the causal space 
resulting as the succession of Aj-interventions. 

4 Concluding Remarks 

We have shown how to derive a simple framework for reasoning under uncer- 
tainty and intervention. This is achieved in three steps. First, we have restated 
logic in set-theoretic terms to obtain a framework for reasoning under certainty. 



(b) 

1 

I 1 

h I 

I 1 I 
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Second, we have extended this framework to model reasoning under uncertainty. 
Finally, we have introduced causal spaces and shown how it provides enough 
information to model knowledge containing causal information about the world. 
This framework can be extended in many ways. Importantly, it has been 



designed to be consistent wit h the literature on Bayesian statistics [Coxl 11961 



Javnes and Bretthorst. 2003] and the lit e rature on causality based on graphs 
Pearll l2000l ISpirtes et all l2000l iDawidl . l2010j and probability trees [Shaferl . 



1996| . 



References 

R.T. Cox. The Algebra of Probable Inference. Johns Hopkins, 1961. 

A. P. Dawid. Beware of the DAG! Journal of Machine Learning Research, (to appear), 
2010. 

E.T. Jaynes and L.G. Bretthorst. Probability Theory: The Logic of Science: Books. 
Cambridge University Press, 2003. 

J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 
Cambridge, UK, 2000. 

G. Shafer. The art of causal conjecture. The MIT Press, 1996. 

P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction and Search. Springer- 
Verlag, New York, 2nd edition, 2000. 



13 



